Hidden Markov models for detecting remote protein homologies

نویسندگان

  • Kevin Karplus
  • Christian Barrett
  • Richard Hughey
چکیده

MOTIVATION A new hidden Markov model method (SAM-T98) for finding remote homologs of protein sequences is described and evaluated. The method begins with a single target sequence and iteratively builds a hidden Markov model (HMM) from the sequence and homologs found using the HMM for database search. SAM-T98 is also used to construct model libraries automatically from sequences in structural databases. METHODS We evaluate the SAM-T98 method with four datasets. Three of the test sets are fold-recognition tests, where the correct answers are determined by structural similarity. The fourth uses a curated database. The method is compared against WU-BLASTP and against DOUBLE-BLAST, a two-step method similar to ISS, but using BLAST instead of FASTA. RESULTS SAM-T98 had the fewest errors in all tests-dramatically so for the fold-recognition tests. At the minimum-error point on the SCOP (Structural Classification of Proteins)-domains test, SAM-T98 got 880 true positives and 68 false positives, DOUBLE-BLAST got 533 true positives with 71 false positives, and WU-BLASTP got 353 true positives with 24 false positives. The method is optimized to recognize superfamilies, and would require parameter adjustment to be used to find family or fold relationships. One key to the performance of the HMM method is a new score-normalization technique that compares the score to the score with a reversed model rather than to a uniform null model. AVAILABILITY A World Wide Web server, as well as information on obtaining the Sequence Alignment and Modeling (SAM) software suite, can be found at http://www.cse.ucsc.edu/research/compbi o/ CONTACT [email protected]; http://www.cse.ucsc.edu/karplus

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Discriminative Framework for Detecting Remote Protein Homologies

A new method for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a generative statistical model for a protein family, in this case a hidden Markov model. This general approach of combining generative m...

متن کامل

Using the Fisher Kernel Method to Detect Remote Protein Homologies

A new method, called the Fisher kernel method, for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a hidden Markov model. The general approach of combining generative models like HMMs with discriminati...

متن کامل

REMOTE HOMOLOGY DETECTION WITH HMMs AND STRUCTURAL ISSUES

Computational methods for homology detection between protein sequences have become a central component in genome analysis. Nowadays, sequences of unknown function are routinely searched against databases of known proteins, providing an important aid for sequence annotation and for guiding laboratory experiments. Although homology identification through pairwise sequence matching [1, 2] is still...

متن کامل

Features Extraction For Protein Homology Detection Using Hidden Markov Models Combining Scores

Few years back, Jaakkola and Haussler published a method of combining generative and discriminative approaches for detecting protein homologies. The method was a variant of support vector machines using a new kernel function called Fisher Kernel. They begin by training a generative hidden Markov model for a protein family. Then, using the model, they derive a vector of features called Fisher sc...

متن کامل

Protein Sequences Classification Based on String Weighting Scheme

Motivation: We present a new technique to recognize remote protein homologies relies on combining probabilistic modeling and supervised learning in high-dimensional feature spaces. The main novelty of our technique is the method of constructing feature vectors using Hidden Markov Model and the combination of this representation with a classifier capable of learning in very sparse high-dimension...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 14 10  شماره 

صفحات  -

تاریخ انتشار 1998